College Trends

College majors and employment

The American Community Survey is a survey run by the US Census Bureau that collects data on everything from the affordability of housing to employment rates for different industries. For this experiment, I will using the data derived from the American Community Survey for years 2010-2012. The team at FiveThirtyEight has cleaned the dataset and made it available on their Github repo.

Here's a quick overview of the files I'll be working with:

all-ages.csv - employment data by major for all ages
recent-grads.csv - employment data by major for just recent college graduates

By completing this challenge, I will test your comfort with Pandas for manipulating DataFrames and calculating summary statistics.



In [1]:

    
import pandas as pd

all_ages = pd.read_csv("all-ages.csv")
all_ages.head(5)









    Out[1]:






  
    
      
      Major_code
      Major
      Major_category
      Total
      Employed
      Employed_full_time_year_round
      Unemployed
      Unemployment_rate
      Median
      P25th
      P75th
    
  
  
    
      0
      1100
      GENERAL AGRICULTURE
      Agriculture & Natural Resources
      128148
      90245
      74078
      2423
      0.026147
      50000
      34000
      80000
    
    
      1
      1101
      AGRICULTURE PRODUCTION AND MANAGEMENT
      Agriculture & Natural Resources
      95326
      76865
      64240
      2266
      0.028636
      54000
      36000
      80000
    
    
      2
      1102
      AGRICULTURAL ECONOMICS
      Agriculture & Natural Resources
      33955
      26321
      22810
      821
      0.030248
      63000
      40000
      98000
    
    
      3
      1103
      ANIMAL SCIENCES
      Agriculture & Natural Resources
      103549
      81177
      64937
      3619
      0.042679
      46000
      30000
      72000
    
    
      4
      1104
      FOOD SCIENCE
      Agriculture & Natural Resources
      24280
      17281
      12722
      894
      0.049188
      62000
      38500
      90000

Summarizing major categories

In both of these datasets, majors are grouped into categories. As you may have noticed, there are multiple rows with a common value for Major_category but different values for Major. We would like to know the total number of people in each Major_category for both datasets.

I will use the Total column to calculate the number of people who fall under each Major_category and store the result as a separate dictionary for each dataset. The key for the dictionary should be the Major_category and the value should be the total count. For the counts from all_ages, store the results as a dictionary named all_ages_major_categories and for the counts from recent-grads, store the results as a dictionary named recent_grads_major_categories.



In [29]:

    
all_ages = pd.read_csv("all-ages.csv")
all_ages_totals  = all_ages.pivot_table(index="Major_category", aggfunc="sum").sort("Total", ascending=[0])["Total"]
all_ages_totals









    Out[29]:





Major_category
Business                               9858741
Education                              4700118
Humanities & Liberal Arts              3738335
Engineering                            3576013
Health                                 2950859
Social Science                         2654125
Psychology & Social Work               1987278
Arts                                   1805865
Communications & Journalism            1803822
Computers & Mathematics                1781378
Biology & Life Science                 1338186
Industrial Arts & Consumer Services    1033798
Physical Sciences                      1025318
Law & Public Policy                     902926
Agriculture & Natural Resources         632437
Interdisciplinary                        45199
Name: Total, dtype: int64



In [30]:

    
recent_grads = pd.read_csv("recent-grads.csv")
recent_totals = recent_grads.pivot_table(index="Major_category", aggfunc="sum").sort("Total", ascending=[0])["Total"]
recent_totals









    Out[30]:





Major_category
Business                               1302376
Humanities & Liberal Arts               713468
Education                               559129
Engineering                             537583
Social Science                          529966
Psychology & Social Work                481007
Health                                  463230
Biology & Life Science                  453862
Communications & Journalism             392601
Arts                                    357130
Computers & Mathematics                 299008
Industrial Arts & Consumer Services     229792
Physical Sciences                       185479
Law & Public Policy                     179107
Agriculture & Natural Resources          79981
Interdisciplinary                        12296
Name: Total, dtype: int64

Low wage jobs rates

The press likes to talk a lot about how many college grads are unable to get higher wage, skilled jobs and end up working lower wage, unskilled jobs instead. As a data person, it is your job to be skeptical of any broad claims and explore if you can acquire and analyze relevant data to obtain a more nuanced view. Let's run some basic calculations to explore that idea further.

I will use the Low_wage_jobs and Total columns to calculate the proportion of recent college graduates that worked low wage jobs. Store the resulting Float object of the calculation as low_wage_percent.



In [19]:

    
recent_grads = pd.read_csv("recent-grads.csv")
low_wage_percent = 0.0

low_wage_sum = float(recent_grads["Low_wage_jobs"].sum())
recent_sum = float(recent_grads["Employed"].sum())

low_wage_percent = low_wage_sum / recent_sum
low_wage_percent









    Out[19]:





0.12371514957893746

So it looks like %12.3 percent of new grads are working in low-wage jobs.

Comparing datasets

Both all_ages and recent_grads datasets have 173 rows, corresponding to the 173 college major codes. This enables us to do some comparisons between the two datasets and perform some initial calculations to see how similar or different the statistics of recent college graduates are from those of the entire population.

We want to know the number of majors where recent grads fare better than the overall population. For each major, determine if the Unemployment_rate is lower for recent_grads or for all_ages and increment either recent_grads_lower_emp_count or all_ages_lower_emp_count respectively.



In [41]:

    
# All majors, common to both DataFrames
majors = recent_grads['Major'].value_counts().index

recent_grads_lower_emp=[]
all_ages_lower_emp=[]

for major in majors:
    recent_unemply_rate = recent_grads[recent_grads["Major"]==major]["Unemployment_rate"].values[0]
    all_time_unemply_rate = all_ages[all_ages["Major"]==major]["Unemployment_rate"].values[0]
    diff = recent_unemply_rate - all_time_unemply_rate #comparator
    
    if diff < 0:
        recent_grads_lower_emp.append(major)
    elif diff >0:
        all_ages_lower_emp.append(major)
    else:
        pass #equal



In [42]:

    
len(recent_grads_lower)









    Out[42]:





43



In [43]:

    
len(all_ages_lower)









    Out[43]:





128

So it looks like for only 43/173 majors new grads have more success than older workers. It follows the old addage the experience is key in the job search. Let's take a look at what industries favor new grads:



In [44]:

    
recent_grads_lower_emp









    Out[44]:





['HUMAN SERVICES AND COMMUNITY ORGANIZATION',
 'ART AND MUSIC EDUCATION',
 'ASTRONOMY AND ASTROPHYSICS',
 'MISCELLANEOUS ENGINEERING TECHNOLOGIES',
 'UNITED STATES HISTORY',
 'SOCIAL PSYCHOLOGY',
 'SOIL SCIENCE',
 'COUNSELING PSYCHOLOGY',
 'INDUSTRIAL AND MANUFACTURING ENGINEERING',
 'PHYSICS',
 'CHEMISTRY',
 'ATMOSPHERIC SCIENCES AND METEOROLOGY',
 'EDUCATIONAL PSYCHOLOGY',
 'PHYSICAL SCIENCES',
 'MISCELLANEOUS PSYCHOLOGY',
 'EARLY CHILDHOOD EDUCATION',
 'DRAMA AND THEATER ARTS',
 'NEUROSCIENCE',
 'GEOSCIENCES',
 'HUMAN RESOURCES AND PERSONNEL MANAGEMENT',
 'MATHEMATICS',
 'ARCHITECTURAL ENGINEERING',
 'MATHEMATICS AND COMPUTER SCIENCE',
 'COURT REPORTING',
 'SPECIAL NEEDS EDUCATION',
 'MATHEMATICS TEACHER EDUCATION',
 'GENETICS',
 'ENGINEERING AND INDUSTRIAL MANAGEMENT',
 'HUMANITIES',
 'AREA ETHNIC AND CIVILIZATION STUDIES',
 'INDUSTRIAL PRODUCTION TECHNOLOGIES',
 'GENERAL AGRICULTURE',
 'ART HISTORY AND CRITICISM',
 'ENGINEERING MECHANICS PHYSICS AND SCIENCE',
 'METALLURGICAL ENGINEERING',
 'MULTI/INTERDISCIPLINARY STUDIES',
 'ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION',
 'MISCELLANEOUS FINE ARTS',
 'ZOOLOGY',
 'HEALTH AND MEDICAL PREPARATORY PROGRAMS',
 'PETROLEUM ENGINEERING',
 'MATERIALS ENGINEERING AND MATERIALS SCIENCE',
 'BOTANY']



In [45]:

    
all_ages_lower_emp









    Out[45]:





['AEROSPACE ENGINEERING',
 'PLANT SCIENCE AND AGRONOMY',
 'GENERAL MEDICAL AND HEALTH SERVICES',
 'ELECTRICAL ENGINEERING',
 'COMMUNICATION TECHNOLOGIES',
 'GEOGRAPHY',
 'AGRICULTURE PRODUCTION AND MANAGEMENT',
 'NUCLEAR ENGINEERING',
 'MASS MEDIA',
 'AGRICULTURAL ECONOMICS',
 'MISCELLANEOUS SOCIAL SCIENCES',
 'FOOD SCIENCE',
 'VISUAL AND PERFORMING ARTS',
 'ENGINEERING TECHNOLOGIES',
 'MOLECULAR BIOLOGY',
 'COMPUTER NETWORKING AND TELECOMMUNICATIONS',
 'PHYSICAL AND HEALTH EDUCATION TEACHING',
 'BIOLOGY',
 'ECONOMICS',
 'SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION',
 'ENVIRONMENTAL ENGINEERING',
 'TRANSPORTATION SCIENCES AND TECHNOLOGIES',
 'HEALTH AND MEDICAL ADMINISTRATIVE SERVICES',
 'ADVERTISING AND PUBLIC RELATIONS',
 'COMPUTER PROGRAMMING AND DATA PROCESSING',
 'POLITICAL SCIENCE AND GOVERNMENT',
 'FINANCE',
 'INTERNATIONAL BUSINESS',
 'COMMUNICATIONS',
 'BIOCHEMICAL SCIENCES',
 'MUSIC',
 'GEOLOGICAL AND GEOPHYSICAL ENGINEERING',
 'NATURAL RESOURCES MANAGEMENT',
 'TREATMENT THERAPY PROFESSIONS',
 'COMMUNICATION DISORDERS SCIENCES AND SERVICES',
 'PHYSIOLOGY',
 'MISCELLANEOUS HEALTH MEDICAL PROFESSIONS',
 'PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION',
 'GENERAL ENGINEERING',
 'COGNITIVE SCIENCE AND BIOPSYCHOLOGY',
 'STUDIO ARTS',
 'MEDICAL TECHNOLOGIES TECHNICIANS',
 'COMPUTER SCIENCE',
 'COMPUTER ENGINEERING',
 'COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY',
 'CRIMINOLOGY',
 'LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE',
 'MISCELLANEOUS BIOLOGY',
 'MINING AND MINERAL ENGINEERING',
 'INTERNATIONAL RELATIONS',
 'ARCHITECTURE',
 'ECOLOGY',
 'OCEANOGRAPHY',
 'NURSING',
 'ANIMAL SCIENCES',
 'SCIENCE AND COMPUTER TEACHER EDUCATION',
 'THEOLOGY AND RELIGIOUS VOCATIONS',
 'CONSTRUCTION SERVICES',
 'BUSINESS ECONOMICS',
 'SOCIAL WORK',
 'MARKETING AND MARKETING RESEARCH',
 'NUTRITION SCIENCES',
 'COMMUNITY AND PUBLIC HEALTH',
 'CIVIL ENGINEERING',
 'FORESTRY',
 'ELEMENTARY EDUCATION',
 'MISCELLANEOUS AGRICULTURE',
 'JOURNALISM',
 'OTHER FOREIGN LANGUAGES',
 'ACCOUNTING',
 'MATERIALS SCIENCE',
 'ELECTRICAL ENGINEERING TECHNOLOGY',
 'LANGUAGE AND DRAMA EDUCATION',
 'PSYCHOLOGY',
 'OPERATIONS LOGISTICS AND E-COMMERCE',
 'APPLIED MATHEMATICS',
 'ENGLISH LANGUAGE AND LITERATURE',
 'FAMILY AND CONSUMER SCIENCES',
 'PHARMACOLOGY',
 'NAVAL ARCHITECTURE AND MARINE ENGINEERING',
 'SOCIOLOGY',
 'SCHOOL STUDENT COUNSELING',
 'COMPOSITION AND RHETORIC',
 'FILM VIDEO AND PHOTOGRAPHIC ARTS',
 'MISCELLANEOUS ENGINEERING',
 'BIOMEDICAL ENGINEERING',
 'INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY',
 'LIBERAL ARTS',
 'COMMERCIAL ART AND GRAPHIC DESIGN',
 'BIOLOGICAL ENGINEERING',
 'PRE-LAW AND LEGAL STUDIES',
 'PHILOSOPHY AND RELIGIOUS STUDIES',
 'ENVIRONMENTAL SCIENCE',
 'PHYSICAL FITNESS PARKS RECREATION AND LEISURE',
 'STATISTICS AND DECISION SCIENCE',
 'MECHANICAL ENGINEERING RELATED TECHNOLOGIES',
 'HISTORY',
 'FINE ARTS',
 'TEACHER EDUCATION: MULTIPLE LEVELS',
 'NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES',
 'MANAGEMENT INFORMATION SYSTEMS AND STATISTICS',
 'GENERAL EDUCATION',
 'PUBLIC POLICY',
 'COSMETOLOGY SERVICES AND CULINARY ARTS',
 'MEDICAL ASSISTING SERVICES',
 'LIBRARY SCIENCE',
 'HOSPITALITY MANAGEMENT',
 'ACTUARIAL SCIENCE',
 'BUSINESS MANAGEMENT AND ADMINISTRATION',
 'INTERDISCIPLINARY SOCIAL SCIENCES',
 'CLINICAL PSYCHOLOGY',
 'MECHANICAL ENGINEERING',
 'ANTHROPOLOGY AND ARCHEOLOGY',
 'INTERCULTURAL AND INTERNATIONAL STUDIES',
 'MISCELLANEOUS EDUCATION',
 'PUBLIC ADMINISTRATION',
 'MULTI-DISCIPLINARY OR GENERAL SCIENCE',
 'CRIMINAL JUSTICE AND FIRE PROTECTION',
 'GENERAL BUSINESS',
 'CHEMICAL ENGINEERING',
 'SECONDARY TEACHER EDUCATION',
 'MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION',
 'FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES',
 'MICROBIOLOGY',
 'COMPUTER AND INFORMATION SYSTEMS',
 'GENERAL SOCIAL SCIENCES',
 'GEOLOGY AND EARTH SCIENCE',
 'INFORMATION SCIENCES']



In [ ]:

	Major_code	Major	Major_category	Total	Employed	Employed_full_time_year_round	Unemployed	Unemployment_rate	Median	P25th	P75th
0	1100	GENERAL AGRICULTURE	Agriculture & Natural Resources	128148	90245	74078	2423	0.026147	50000	34000	80000
1	1101	AGRICULTURE PRODUCTION AND MANAGEMENT	Agriculture & Natural Resources	95326	76865	64240	2266	0.028636	54000	36000	80000
2	1102	AGRICULTURAL ECONOMICS	Agriculture & Natural Resources	33955	26321	22810	821	0.030248	63000	40000	98000
3	1103	ANIMAL SCIENCES	Agriculture & Natural Resources	103549	81177	64937	3619	0.042679	46000	30000	72000
4	1104	FOOD SCIENCE	Agriculture & Natural Resources	24280	17281	12722	894	0.049188	62000	38500	90000